System and method for cancer detection

ABSTRACT

Nuclear Factor I/B (Nfib), a protein important to lung maturation in human embryos, is an oncogene in SCLC. This novel bioinformatics image-processing tool analyzes digital images of biopsies stained for Nfib. First, the model was trained to determine whether the biopsy was cancerous or not, then it was trained to predict the whether the biopsy represented limited stage SCLC or extensive stage SCLC. The factors considered were the amount of positive Nfib staining, amount of negative Nfib staining, amount of “non-tissue” areas on the slide, and the intensity of the staining itself. Overall this tool is highly accurate with 95.11% accuracy. Doctors can directly use this tool to accurately predict stage of SCLC in less than one minute. This system application can allow doctors to better guide their patients&#39; treatments of SCLC.

FIELD OF INVENTION

This disclosure describes both a system and a method to detect cancer. More specifically, it relates to software and biological data used together to detect cancer.

BACKGROUND

Lung cancer is the leading cause of cancer death for both men and women worldwide. Small cell lung cancer (SCLC), also known as oat cell carcinoma, is the most fatal and aggressive subtype of lung cancer. The five-year survival rate for stage 4 SCLC remains a dismal 2% due to the rapid onset of metastasis. Metastasis is the process by which cancer cells migrate from the primary site to secondary sites via blood vessels. There are two overall stages of SCLC: limited stage and extensive stage; extensive stage SCLC is defined by the metastasis of the cancer past the supraclavicular areas in the lung. Doctors separate the two because the treatment plans differ—surgery, radiation therapy, and chemotherapy are preferred for treating limited stage SCLC whereas chemotherapy alone is preferred for treating extensive stage SCLC. Unfortunately, if the cancer remains undetected and untreated, the average patient survival time is only 2 to 4 months. This prognosis has not advanced in nearly three decades.

Due to the excessive costs of screening tests, most patients only go to the doctor only once symptoms show. The stage of cancer cannot be detected until expensive computerized tomography (CT) scans or positron emission tomography (PET) scans, two imaging tests that assess patient health, have been administered to gather further information. Although helpful, CT scans have been shown to be ineffective for SCLC screening purposes. PET scans have proved helpful in predicting the stage of the cancer, but cost thousands of dollars per scan. Currently, a biopsy is taken from the patient's lung and stained with hematoxylin and eosin (H&E stain). H&E stains are helpful in analyzing tissue biopsies, but are neither cancer-specific nor patient-specific. Light microscopy is used to analyze the stained biopsies. The standard features pathologists look for include the size, shape, and density of cells. Merkel cell carcinoma is histologically similar to SCLC, making a definitive SCLC diagnosis difficult.

Analyzing the biopsies and initiating the correct treatment can take weeks. Considering that SCLC tumors can double in size in as short of a time period as a month, doctors cannot afford to wait weeks before confirming that a patient has extensive stage or limited stage SCLC. Accurate analysis of biopsies is critical to deciding treatment plans. There is a need for a rapid, but cost and time-efficient method for diagnosis and treatment.

SUMMARY

Several embodiments for a system, method and process for a system and method to evaluate images and corroborating it with actual biological samples are disclosed. The proposed system, process and method enable the medical practitioners and allied filed professionals to diagnose the disease such as cancer rapidly and provide effective treatment. In one embodiment, the system is used for image analysis of biopsy samples. In another embodiment, to analyze image results a machine learning process is used.

In one embodiment, system, and method as a process uses a software application detects the stage of lung cancer by analyzing the digital image of the patient's standard biopsy. Nuclear Factor I/B as a biomarker representative of lung cancer metastasis is used for diagnosis.

In another embodiment, once the lab process is complete using this software application that combines image analysis and calculations within 1 minute, doctors can learn about the metastatic potential of patient's tumors to accurately diagnose small cell lung cancer. A specific programming language such as ‘R’ was used to create this work flow and program for analyzing the samples that were cancerous.

In one embodiment, a machine learning process was used and the data is divided into two groups: a training data set, from which the machine can learn to create a model, and a testing data set, on which the model can be tested. Two levels of classification were done: pixel-by-pixel analysis and entire-biopsy image analysis.

In another embodiment, a logistic regression is used for a machine learning regression model that is useful when working with binary variables. The logistic regression curve creates a clear separation between any class 0 and any class 1 objects. In another embodiment, the results showed that cancerous tissues are denser than healthy tissues due to the increased number of cells present per unit area. Thus, the first feature taken into account when analyzing the biopsies was the amount of ‘non-tissue’ areas and the amount of ‘tissue’ areas. 20 Images representing ‘non-tissue’ areas and ‘tissue’ areas served as the training data set for the “Tissue or Non-Tissue” logistic regression classifier. The images were analyzed pixel-by-pixel to ensure that an accurate area could be calculated for each of the two types of regions.

The system, method and process disclosed herein may be implemented by any means for achieving various aspects and may be executed in a form of a machine-readable medium embodying a set of instructions that, when executed by a machine, cause the machine to perform any of the operations disclosed herein. Other features will be apparent from the accompanying drawings and from the detailed description that follows.

BRIEF DESCRIPTION OF THE DRAWINGS

Example embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:

FIGS. 1A and 1B illustrate a flow chart of process for the system illustrating the pipeline of the analysis from the input (stained biopsy image) to the output (a predicted stage.) The models used were the “Tissue or Non-tissue” Classifier, “Healthy or Cancerous” Classifier, “Positive or Negative” Classifier, and the overall Multinomial Model, as constructed in accordance with at least one embodiment.

FIG. 2A illustrates a sample image of lung tissue section, stained for Nfib. Blue staining indicates Nfib− regions whereas brown staining indicates Nfib+ regions.

FIG. 2B illustrates a sample image of liver tissue section, stained for Nfib. Blue staining indicates Nfib− regions whereas brown staining indicates Nfib+ regions.

FIGS. 3A and 3B are schematic views illustrating mixed tumors were defined as ones that were partially Nfib+ and partially Nfib−. n=20 mice. For the extensive stage SCLC tissues shown in FIG. 3A, there were 10 Nfib+ tumors, 61 Nfib− tumors, and 11 mixed tumors in the lung. For FIG. 3B there were 564 Nfib+ tumors, 22 Nfib− tumors, and 0 mixed tumors in the liver. The graphs indicate that, in extensive stage SCLC tissues, the vast majority of primary site tumors are Nfib− whereas the vast majority of metastatic site tumors are Nfib+. Increase in Nfib+ tumors at the metastatic site suggests a role of Nfib in SCLC metastases.

FIGS. 4A through 4C depict a flow diagram illustrating a method. The H889, 16T, and K1 cells lines were used for the knockdown experiments. In FIG. 4A, there is efficient knockdown of NFIB in the H889 shNFIB cell line when compared to the H889 shGFP control cell line. In FIG. 4B, there is efficient knockdown of Nfib in the 16T and K1 shNfib cell lines when compared to the 16T and K1 shLuciferase control cell lines. In FIG. 4C, the H29 overexpression cell line (+NFIB; the NFIB gene was inserted) had increased Nfib expression when compared to the H29 control cell line (+Empty; no gene was inserted.)

FIG. 5 illustrates that knockdown of Nfib expression led to a significant decrease in number of seeded metastases in the livers.

FIGS. 6A through 6D show that Alamar blue assays indicate Nfib overexpression led to increased proliferation and Nfib knockdown led to decreased proliferation over a period of 8 days. shLuc is short for shLuciferase.

FIG. 7A illustrates a “Tissue or Non-Tissue” Classifier Model for separation of ‘non-tissue’ areas from ‘tissue’ areas has an AUC of 0.8888.

FIG. 7B illustrates a “Positive or Negative” Classifier Model for separation of Nfib+ from Nfib− areas has an AUC of 0.9573.

Other features of the present embodiments will be apparent from the accompanying drawings and from the detailed description that follows.

DETAILED DESCRIPTION

In the instant application a rapid, cost-effective tool to predict the stage of small cell lung cancer (SCLC) is described. SCLC spreads quickly, and current methods to determine whether it has spread require complex machinery and expensive scans. Seeking an alternative, I studied Nuclear Factor I/B (Nfib), a protein important to lung maturation, and found it was also crucial to the growth and spread of SCLC cells. We have developed an application to analyze digital images of lung biopsies stained for Nfib. We have trained this application by using a model to determine if the biopsy was cancerous and to predict whether it represented limited or extensive stage SCLC. This tool is 95-98 percent accurate and can process a scan within one minute. The R programming language was used to create a novel bioinformatics application to assess the amount of Nfib in patient lung biopsies, and, for the first time, predict whether patients have early/limited stage SCLC or extensive stage SCLC. Doctors need a tool to accurately, quickly, and cost-effectively determine the stage of SCLC.

Nuclear Factor I is a family of four transcription factors that regulate the development of many organs in humans and mice. A member of this family of proteins, Nuclear Factor I/B (Nfib), is essential for embryonic brain and lung development in humans and mice. Nfib deficiency can lead to delayed lung maturation, callosal agenesis, and forebrain defects. Nfib also plays a role in ER-negative breast cancers, which metastasize quickly, much like SCLC. The role of Nfib in predicting SCLC metastases in humans has not been researched before. Throughout this paper, NFIB is used to refer to the human gene, Nfib to the mouse gene, and Nfib to the protein.

This method was first characterized for the role of Nfib in human SCLC and metastasis. The amount of Nfib present in human SCLC biopsies was used to train a machine learning model to predict whether a patient had limited stage or extensive stage SCLC.

Machine learning “refers to the ability of a system to change its behavior without being explicitly programmed.” This allows for predictive power based on past data obtained from patients. Machine learning analyses on lung cancer have primarily focused on Non-Small Cell Lung Cancer (NSCLC) and genomics. Earlier approaches have not focused on diagnosing the stage of the SCLC cancer using the immunohistochemistry stain of the biopsy alone.

This system and methods have two parts: Part 1: Investigate the role of Nfib in human SCLC, and Part 2: Design a Machine Learning tool to analyze biopsies, based on their Nfib staining, and predict whether the patient has limited or extensive stage SCLC.

The instant disclosure details the system, application and method and is the first application of its type that accurately detects the stage of lung cancer by analyzing the digital image of the patient's standard biopsy. It uses advanced machine learning algorithms and research on Nuclear Factor I/B as a biomarker representative of lung cancer metastasis. One can directly use this system and method within 1 minute, learn about the metastatic potential of patient's tumors to accurately diagnose small cell lung cancer. This application can increase patient survival from months to years and help save thousands of lives.

Role of Nfib in SCLC

Over the course of metastasis, cancer cells become metastatic, disseminate, seed at a secondary site, and grow at the secondary site. In SCLC, the main metastatic site is at the liver. A study for the expression of Nfib in limited stage SCLC mouse lung biopsies and extensive stage SCLC mouse lung and liver biopsies was done. The results showed that Nfib expression was more prevalent in the primary lung sites or the metastatic liver sites. A further analysis of whether Nfib expression affected the number of seeded metastases and whether Nfib expression affected the growth of SCLC cells was then conducted.

Mouse Model Used for Tissues

Lung and liver tissues from a mouse model were used to study SCLC at the primary site and metastatic site respectively. Mice with a triple knockout (TKO) of the genes p53, p130, and RB1 were used to ensure that the mouse model closely resembled human SCLC. The term “TKO mouse” refers to mice with this triple knockout of genes and thus, mice with SCLC. This mouse model was based on previous knowledge that RB1 and p53 are expressed in the vast majority of SCLC cases and p130 knockout enhances model similarity to human SCLC functionally and histopathologically. The researcher did not handle the live mice and solely worked with the tissue sections.

Immunohistochemistry Analysis

Immunohistochemistry (IHC) assays were used to identify areas of tissue on the biopsies with high Nfib expression. IHC assays were carried out on 20 sets of mouse lung and liver tissue (of mice with extensive stage SCLC and of mice with limited stage SCLC.) This helped determine whether Nfib was primarily expressed mainly in the primary lung tissues or the metastatic liver tissues in extensive stage SCLC. Cells expressing Nfib in large amounts are referred to collectively as Nfib+ and those expressing little or no Nfib are referred to collectively as Nfib−. The number of Nfib+ and Nfib− tumors was compared.

The immunohistochemistry assays were carried out at room temperature. Mouse lung and liver tissue sections were obtained following their deposition onto glass slides in 4 sections. Samples were rehydrated and boiled in citrate buffer to facilitate antigen retrieval. The lung and liver tissue sections were washed in Phosphate Buffered Saline (PBS) three times between protein blocks. The DAKO Dual Endogenous Enzyme Block was applied to block endogenous peroxidase. Avidin, biotin, and protein blocks were then added to the sections followed by an overnight incubation period in primary anti-NFIB antibody (1:1000 dilution.) Sections were then incubated with the anti-rabbit secondary antibody and then the Avidin/Biotin complex (ABC) reagent for 30 minutes each. 3,3′-diaminobenzidine (DAB) chemically reacted with the horseradish peroxidase in the Avidin/Biotin Complex, making the Nfib+ regions visibly brown. A hematoxylin counterstain and an acid alcohol solution (1% Hydrochloric acid and 70% Ethanol) were applied to stain the other tissue areas blue. Slides were dehydrated and mounted prior to examination.

Tissue Culture and Cell Lines: Small Cell Lung Cancer Cell Lines

All cells were grown in incubators at 37° C. with 5% CO2. Cells were cultured in Dulbecco's Modified Eagle Medium (DMEM) media mixed with Fetal Bovine Serum (FBS), penicillin, and streptomycin. The following cell lines were used: H889, H29, 16T, and K1. H889 and H29 are human SCLC cell lines, and 16T and K1 are mouse SCLC cell lines.

Preparing Knockdown and Overexpression Cell Lines

shRNA was used to knockdown Nfib expression through RNA interference. shNfib was inserted into a plasmid vector. A separate set of plasmids was made with shLuciferase or shGFP. Luciferase and GFP served as controls as both genes are not affected in this experiment. This served as a negative control to ensure that any effect the shRNA had on the cells was accounted for.

To overexpress NFIB, plasmids containing all necessary parts of the Tetracycline On (TetOn) System and the Nfib gene were inserted into SCLC cells through transfection. As a negative control, a separate set of the same cells was given a plasmid with the same TetOn system but no inserted gene (denoted as Empty.) This system allowed for the overexpression of Nfib upon the addition of Doxycyline.

The plasmids for both the overexpression and knockdown cell lines were then inserted into Escherichia coli bacteria (from New England Biolabs) through transformation. Polymerase Chain Reaction (PCR) was carried out and the resulting samples were run on an agarose gel (1.5%) to screen for the bacterial colonies that had the correctly sized plasmids. These plasmids were then isolated and sequenced to ensure that the gene and promoter regions had no mutations. Plasmids without mutations were packaged into lentiviruses, which were inserted into the 293T cells via transfection. The 293T cell line, derived from kidney cells and known for its proficiency in lentiviral protein production, was used for the rapid production of lentiviruses with the vector plasmids. The media with the viruses was transferred from the 293T cells to the SCLC cells, thus infecting the SCLC cells. This resulted in the overexpression or knockdown of Nfib in the SCLC cells. Doxycycline was added to the overexpression cell line to maintain NFIB overexpression.

Role of Nfib in Seeding of Metastases

Nfib expression was knocked down in one set of TKO mice, whereas it was left the same in another set of TKO mice. To determine the role of Nfib the seeding of metastases at the liver, the number of liver metastases was compared between the control and Nfib knockdown tissues.

Role of Nfib in Growth of SCLC Using Alamar Blue Assays

Knockdown and overexpression cell lines were grown in 48-well plates. 10 milliliters of the Alamar Blue dye were added every two days beginning at day 0 (the day the cells were seeded) until day 8. After a 3-hour incubation period in the incubator, the plate was analyzed using a spectrophotometer. Absorbance levels were representative of the number of viable cells in each well; this assay gauged SCLC growth over time.

Stage 2: Create a Model to Predict Whether Patient has Limited or Extensive Stage SCLC. Computational Analysis: Sample Preparation

The LC818 and LC814a human SCLC tissue arrays were obtained from US Biomax, Inc. Samples represented all stages of SCLC. All biopsies were stained for Nfib with an IHC assay then imaged at the 20× magnification using an imaging microscope. This procedure is similar to how digital biopsies are taken in a clinical setting.

The programming language R was used to code the entire classifier model. High-quality TIFF images were processed using the “rtiff” package in R. In machine learning, the data is divided into two groups: a training data set, from which the machine can learn to create a model, and a testing data set, on which the model can be tested. Two levels of classification were done: pixel-by-pixel analysis and entire-biopsy image analysis.

Logistic regression is a machine learning regression model that is useful when working with binary variables. The logistic regression curve creates a clear separation between any class 0 and any class 1 objects.

In general, cancerous tissues are denser than healthy tissues due to the increased number of cells present per unit area. Thus, the first feature taken into account when analyzing the biopsies was the amount of ‘non-tissue’ areas and the amount of ‘tissue’ areas. 20 Images representing ‘non-tissue’ areas and ‘tissue’ areas served as the training data set for the “Tissue or Non-Tissue” logistic regression classifier. The images were analyzed pixel-by-pixel to ensure that an accurate area could be calculated for each of the two types of regions.

To analyze whether the patient had SCLC, the overall model was trained to check if the tissue was cancerous or not. Healthy human lung tissue biopsies in the LC2085c tissue array were obtained from US Biomax, Inc. and were split into two groups by a 70:30 ratio to serve as the training data set and testing data set respectively. SCLC lung tissue biopsies from the same tissue array were also split into a 70:30 ratio to serve as the training data set and the testing data set respectively. The image was analyzed as a whole, and not on a pixel-by-pixel level, because the entire biopsy was either cancerous or healthy. Since cancerous tissues are much more dense, they would have distinctively fewer ‘non-tissue’ pixels. Thus, tissues that were healthy were separated from tissues that were cancerous with a logistic regression model based on the amount of ‘non-tissue’ pixels present. The model formed was called the “Healthy or Cancerous” classifier.

Next, the overall model was trained to find the amount of Nfib+and Nfib− staining on the biopsy because Nfib proved important to SCLC metastasis in Part 1 of this research. Known Nfib+ and known Nfib− images were used to train the “Positive or Negative” logistic regression classifier. Only pixels classified as ‘tissue’ from the “Tissue or Non-tissue” logistic regression classifier were considered in this step.

Finally, the intensity of the staining was considered because a darker staining indicates the presence of more Nfib. The staining levels were separated into nine categories, ranging from a “maximum” Nfib+ region to a “maximum” Nfib− region. Thus, 12 final characteristics were obtained per SCLC image: number of pixels in ‘non-tissue’ areas, number of pixels in Nfib+ stained areas, number of pixels in Nfib− stained areas, and number of pixels in each of the nine staining intensity categories.

A multinomial regression model was used to explain the stage of the biopsy using these 12 final characteristics. The “glmnet” package in R was used for this. The entire analysis pipeline is shown in FIG. 1.

FIG. 2A illustrates a sample image of lung tissue section, stained for Nfib. Blue staining indicates Nfib− regions whereas brown staining indicates Nfib+ regions.

FIG. 2B illustrates a sample image of liver tissue section, stained for Nfib. Blue staining indicates Nfib− regions whereas brown staining indicates Nfib+ regions.

FIG. 3. Mixed tumors were defined as ones that were partially Nfib+ and partially Nfib−. n=20 mice. For the extensive stage SCLC tissues (A) there were 10 Nfib+ tumors, 61 Nfib− tumors, and 11 mixed tumors in the lung whereas (B) there were 564 Nfib+ tumors, 22 Nfib− tumors, and 0 mixed tumors in the liver. The graphs indicate that, in extensive stage SCLC tissues, the vast majority of primary site tumors are Nfib− whereas the vast majority of metastatic site tumors are Nfib+. Increase in Nfib+ tumors at the metastatic site suggests a role of Nfib in SCLC metastases.

FIG. 4. The H889, 16T, and K1 cells lines were used for the knockdown experiments. (A) There is efficient knockdown of NFIB in the H889 shNFIB cell line when compared to the H889 shGFP control cell line. (B) There is efficient knockdown of Nfib in the 16T and K1 shNfib cell lines when compared to the 16T and K1 shLuciferase control cell lines. (C) The H29 overexpression cell line (+NFIB; the NFIB gene was inserted) had increased Nfib expression when compared to the H29 control cell line (+Empty; no gene was inserted.)

FIG. 5. Knockdown of Nfib expression led to a significant decrease in number of seeded metastases in the livers.

TABLE 1 The table shows the Area Under the Curve (AUC) for each of the five cross-validation data sets. Each of the five cross validation sets was different. The average AUC value, and thus the accuracy of the overall multinomialmodel, was 95.11%. Cross Validation Set 1 2 3 4 5 AUC 0.9487 0.9722 1.00 0.9048 0.9298

The present tools and methods provide an accurate, patient-specific, fast, and cost-effective means of detecting small cell lung cancer, while improving efficiency by being quicker and easier (than currently used tools and methods) to detect small cell lung cancer stage. Specifically, the present tools and methods do not require time consuming (results take weeks to be reported back to doctor and patients) and expensive tools like computerized tomography (CT) and positron emission tomography (PET). Although the present assemblies and methods have been discussed for utilization with human subjects, such assemblies and methods are not so limited. It can be appreciated by those skilled in the art that the present tools and methods may be utilized for other types of cancer. In addition, while a number of specific biomedical, biotechnology and bioinformatics techniques and experiments and computer machine learning programs or methods are discussed above, the cancer detection and diagnostics tools and methods can be created in method orders other than those discussed. For example, the cancer detection tool could be used in conjunction with genetic results and using human genome.

It is to be understood that the above description is intended to be illustrative, and not restrictive. For example, the above detailed description may be used in combination with each other for other cancers in humans and non-humans. Many other embodiments will be apparent to those of skill in the art upon reading and understanding the above description. The scope of the invention should, therefore, be determined with reference to the appended claims, along with the full scope of legal equivalents to which such claims are entitled. In the appended claims, the term “including” is used as the plain-English equivalent of the term “comprising.” Also, in the following claims, the terms “including” and “comprising” are open-ended, that is, a system, tool, device, or method that includes elements in addition to those listed after such a term in a claim are still deemed to fall within the scope of that claim. Moreover, in the following claims, the terms “first,” “second,” and “third,” etc. are used merely as labels, and are not intended to impose numerical requirements on their objects. 

1. System and method to diagnose small cell lung cancer and the stage (limited vs. extensive) of that cancer.
 2. Similar procedures can be implemented to diagnose other cancers that express biomarkers in differing quantities between healthy and cancerous cells due to the functional roles of these biomarkers. This invention is a reliable method to diagnose cancers based off of biomarker-specific staining and thus, this method holds true as long as there is a qualitative difference in biomarker staining between the tissues that are healthy and those that are cancerous. 